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1. Introduction 

1.1. The model. Suppose that we have n independent and identically 
distributed observations (Xi, Yi) G R X M from the regression model 

(1-1) Y i = f{X i )+t i , 

where /: K — > M, the variables are centered Gaussian of variance a 2 and 
independent of X±, . . . , X n (the design), and the Xi are distributed with density fj,. 
We want to recover / at a chosen xq. 

For instance, if we take the variables (Xi) distributed with density 

^ = m i ~7T~ ~ , g+1 \ x - x ofMo,i](x), 
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for xq E [0,1] and (3 > — 1, then clearly when (3 > this density models a lack 
of information at xq and conversely an exploding amount of information if — 1 < 
(3 < 0. We want to understand the influence of the parameter (3 on the amount of 
information at xo in the minimax setup. 

1.2. Motivations. The pointwise estimation of the regression function is a 
well-known problem, which has been intensively studied by many authors. The first 
authors who computed the minimax rate over a nonparametric class of Holderian 
functions were Ibragimov and Hasminski (1981) and Stone (1977). Over the class 
of Holder functions with smoothness s, the local polynomial estimator converges 
with the rate n^ s ^ 1+2s ^ (see Stone (1977)) and this rate is optimal in the minimax 
sense. Many authors worked on related problems: see, for instance, Korostelev and 
Tsybakov (1993), Nemirovski (2000), Tsybakov (2003). 

Nevertheless, these results require the design density to be non-vanishing and 
finite at the estimation point. This assumption roughly means that the information 
is spatially homogeneous. The next logical step is to look for the minimax risk at a 
point where the design density fi is vanishing or exploding. To achieve such a result, 
it seems natural to consider several types of design density behaviour at xq and to 
compute the corresponding minimax rates. Such results would improve the statis- 
tical description of models (here in the minimax setup) with very inhomogeneous 
information. 

When / has a Holder type smoothness of order 2 and if /i(x) ~ x 13 near 0, where 
(3 > 0, Hall et al. (1997) show that a local linear procedure converges with the 
rate n^ 4 ^ 5+ ^ when estimating / at 0. This rate is also proved to be optimal. 
In a more general setup for the design and if the regression function is Lipschitz, 
Guerre (1999) extends the result of Hall et al. for (3 > —1. Here, we intend to 
develop the regression function estimation for degenerate designs in a systematic 
way. 

1.3. Organization of the paper. In Section 2 we present two theorems 
giving the pointwise minimax convergence rate in the model (1.1) for different 
design behaviours (Theorems 1 and 2). In Section 3 we construct an estimator 
and in Section 4 give upper bounds for this estimator (Propositions 4 and 5). In 
Section 5 we discuss some technical points. The proofs are delayed until Section 6 
and well-known facts about the regular and F-variation are given in the Appendix. 

2. Main Results 

All along this study we are in the minimax setup. We define the pointwise 
minimax risk over a class S by 

(2.1) KnP,n) = (inf su P E^{|T„(x ) - / (x )| p }) ^ 

where inf x n is taken over all estimators T n based on the observations (1-1), with xq 
being the estimation point and p > 0. The expectation EJ in (2.1) is taken with 
respect to the joint probability distribution P^ of the pairs (Xi, Yi) i=1 ^,, yJl . 

2.1. Regular variation. The definition of regular variation and the 
main properties are due to Karamata (1930). The main references on regular vari- 
ation arc Bingham et al. (1989), Geluk and de Haan (1987), Resnick (1987), and 
Senata (1976). 
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Definition 1 (Regular variation) . A continuous function v : K + —> M + is regu- 
larly varying at if there is a real number (3 G K such that: 

(2.2) Vy>0, lim v(yh)/v(h) = 

>o+ 

We denote by RV(/3) the set of all the functions satisfying (2.2). A function in 
RV(0) is slowly varying. 

Remark. Roughly, a regularly varying function behaves as a power function 
times a slower term. Typical examples of such functions are x 13 , x /3 (log(l/a;)) 7 for 
7 G M, and more generally any power function times a log or a composition of 
log-functions to some power. For other examples, see the references cited above. 

2.2. The functions class 

Definition 2. If 5 > and uo G RV(s) with s > we define the class Fs(xo, w ) 
of functions / : [0, 1] — > R such that 

V/i<<5, inf sup \f(x)-P(x-x )\<u(h), 

PeV k \x-X„\<h 

where k = [s\ (the largest integer smaller than s) and Vk is the set of all the real 
polynomials with degree k. We define £ u (h) = ui(h)h^ s 7 the slow variation term 
of lo. If a > 0, we define 

U(a) = {/: [0,1] -» M such that H/H^ < a}. 

Finally, we define 

Y±s t a(xo,u) = Ts{x ,uj) nW(a). 

Remark. If we take u>(h) = rh s for some r > 0, then we get the classical Holder 
regularity with radius r. In this sense, the class fj(io,w) is a slight generalization 
of the Holder regularity. 

Assumption M. In what follows, we assume that there exists a neighbourhood 
W of x and a continuous function v : R+ — > R+ such that: 

(2.3) VxgVF, //(a) = i/(|a; - a; |). 

This assumption roughly means that close to xo there are as many observations 
on the left of xo as on the right. All the following results can be extended easily to 
the non-symmetric case, see Section 5.1. 

2.3. Regularly varying design density. Theorem 1 gives the minimax 
rate over the class S (see Definition 2) for the estimation problem of / at xo when 
the design is regularly varying at this point. 

We denote by 1Z(xq,(3) the set of all the densities /x such that (2.3) holds with 
v G RV(/3) for a fixed neighbourhood W. 

Theorem 1. If 

• (s, 0) G (0, +oo) x (-1, +oo) or (a, (3) G (0, 1] x {-1}, 

• X = £/i„. Q „(xo, u>) with lo G RV(s), a n = 0(n 7 ) for some 7 > and h n 
given by (2.5), 

• n G K(xo,P), 
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then we have 

(2.4) ^„(S, M )x ( x 2s /( 1 + 2s +«n- s /( 1 + 2s +«^(n- 1 ) as n^+oo, 

where lu, v is slowly varying and where x stands for the equality in order, up to 
constants depending on s, (3 and p (see (2.1)) but not on a. Moreover, the minimax 
rate is equal to uj(h n ), where h n is the smallest solution to 

(2.5) w(h) = = 

^2nfiv{t)dt 

Example. The simplest example is the non-degenerate design case (0 < /i(xo) < 
+oo) with the class E equal to a Holder ball (uj(h) — rh s , see Definition 2). This 
is the common case found in the literature. In particular, in this case, the design 
is slowly varying (f3 = with the slow term constant and equal to lim x _> xo /x(x)). 
Solving (2.5) leads to the classical minimax rate 

.2 S /(i+2s) r i/(i+2 S ) n - s /(i+2 S ) > 

Example. Let (3 > -1. We consider v such that j^v(t)dt = h 0+1 (log(l/h)) a 
and ui(h) = rh s (log(l//i)) 7 , where a, 7 are any real numbers. In this case, we find 
that the minimax rate (see Section 6.5 for details) is 

a .2 s /(l+2 s +/3) r (/3+l)/(l+2 s +/3)^ n Q ogn ^- 7 (l+/3)/^-s/(l+2s+/3)_ 

We note that this rate has the form given by Theorem 1 with the slow term 
LAh) = (log(l//i))(^+ 1 )- sa )A 1+2;s +' 3 ). When 7 (1 + (3) - sa = 0, there is no 
slow term in the minimax rate, although there are slow terms in v and lo. Again, 
if [3 = and 7 = sa, we get the minimax rate of the first example, although the 
terms v and w do not have the classical forms. 

Example. Let (3 = —1, a > 1, and v(h) = h~ 1 (\og(l/h))~ a . Let w be the same 
as in the previous example with < s < 1. Then the minimax convergence rate is 

( rn- 1 / 2 (logn)(«- 1 )/ 2 . 

This rate is almost the parametric estimation rate, up to the slow log factor. 
This result is natural since the design is very "exploding" : we have a lot of in- 
formation at x , thus we can estimate /(xo) very fast. Also, we note that the 
regularity parameters of the regression function (r, s, and 7) have (asymptotically) 
disappeared from the minimax rate. 

2.4. r~ varying design density. The regular variation framework in- 
cludes any design density behaving close to the estimation point as a polynomial 
times a slow term. It does not include, for instance, a design with a behaviour 
similar to exp(— l/\x — x \) and defined as at x , since this function goes to at 
x faster than any power function. 
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Such a local behaviour can model the situation where we have very little infor- 
mation. This example naturally leads us to the framework of T-variation. In fact, 
such a function belongs to the following class introduced by de Haan (1970). 

Definition 3 (r- variation) . A non-decreasing continuous function v : M + — ► R + 
is T-varying if there exists a continuous function p: M + — ► M + such that 

(2.6) VyeM, lim u(h + yp(h))/v(h) = exp(y). 

h— ►()+ 

We denote by T\(p) the class of all such functions. The function p is called the 
auxiliary function of v. 

Remark. A function behaving like exp(— \j\x — xq\) close to x satisfies As- 
sumption M with v(h) = exp(— 1/h), where v e TV(p) with p(h) = h 2 . 

Theorem 2. If 

• S = T, hnt0ln {xQ,uj), where co £ RV(s) with < s < 1, h n is given by (2.5) 
and a n — (9(r^ 7 ) /or some 7 > r„ = u)(h n ), 

• p satisfies Assumption M wit/i ^ G TV(p), 

then 

1Z n (T,,p) x C^n -1 ) as n — > +00, 

w/iere is slowly varying. Moreover, as in Theorem 1, the minimax rate is equal 
to uj(h n ), where h n is the smallest solution to (2.5). 

Example. Let p satisfy Assumption M with u(h) = exp(— l/h a ) for a > and 
Lo{h) = rh s for < s < 1. It is an easy computation to see that v belongs to the 
class rV(p) for the auxiliary function p(h) = a~ 1 h a+1 . In this case, we find that 
the minimax rate (see Section 6.5 for details) is 

r(logn)- s / a . 

As shown by Theorem 2, we find a very slow minimax rate in this example. We 
note that the parameters s and a are on the same scale. 

3. Local Polynomial Estimation 

3.1. Introduction. For the proof of the upper bound in Theorem 1 we use 
a local polynomial estimator. The local polynomial estimator is well-known and has 
been intensively studied (see Stone (1977), Fan and Gijbels (1996), Spokoiny (1998), 
Tsybakov (2003), among many others). If / is a smooth function at xo, then it 
is close to its Taylor polynomial. A function / £ C k (xo) (the space of k times 
differcntiable functions at Xq with a continuous fe-th derivative) is such that for any 
x close to x 

(3.1) f{x) » f(x ) + f'(x )(x - x ) + . . . + - x ) k - 

Let h > (the bandwidth) and k £ N. Wc define <j>j,h(x) = (^j^Y and the space 
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For a fixed non-negative function K (the kernel) we define the weighted pseudo- 
scalar product 

(3-2) (/, 9)h,k ± £ f{X l )g{X l )K(±^-) , 



and the corresponding pseudo-norm || • \\h,K — \J (•, -)h,K {K > 0). In view of (3.1) 
it is natural to consider the estimator defined as the closest polynomial of degree k 
to the observations (Yi) in the least square sense, that is: 

(3-3) f h = axgnan\\g-Y\\l K . 

gev k , h 

Then fh(xo) is the local polynomial estimator of / at xq. A necessary condition for 
fh to be the minimizer of (3.3) is that it solves the linear problem: 

(3.4) find / e V k , h such that V0 e V k , h , (/, <j>)h,K = (Y, <f>) h>K . 
The estimator f h is then given by 

(3.5) f h = P ?h , 
where 

(3.6) P g = 9o<po,h + 0i<f>i,h + • • • + h</>k,h, 

with 9h the solution, whenever it makes sense, of the linear system 

(3-7) X£0 = Yf , 

where Xf is the symmetric matrix with entries 

(3.8) (X* = (<f>j,h,<t>i,h)h,K, 0<j,l<k, 

and Yf is the vector defined by 

Yf = «y,^, h ) h ,A-;0<j<fc). 

We assume that the kernel if satisfies the following assumptions: 

Assumption K. Let K be the rectangular kernel K R (x) = \l\ x \<i or a non- 
negative function such that: 

• Suppif C [-1,1], 

• K is symmetric, 

• ifoo = sup^, K{x) < 1, 

• there is some p > and n > such that Vx, y, |-K"(a;) — if (y)| < p\x — y\ K . 
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Assumption K is satisfied by all the classical kernels used in nonparametric curve 
smoothing. Let us define 

(3.9) N nJl = #{X t such that X, e [x -h,x + h]}, 

the number of observations in the interval [xq — h, xq + h] , and the random matrix 

yK A lu-lyX 
X h - N n,h^h ■ 

Denote X n = cr(Xi, . . . , X n ) the a- algebra generated by the design. Note that Xff 
is measurable with respect to X n . The matrix X^ is a "renormalization" of . 
We show in Lemma 6 that this matrix is asymptotically non-degenerate with large 
probability when the design is regularly varying. 

For technical reasons, we introduce a slightly different version of the local poly- 
nomial estimator. We introduce a "correction" term in the matrix Xj^. 

Definition 4. Given some h > 0, we consider fh defined by (3.5) with dh the 
solution when it makes sense (if N n ^ = we take fh = 0) of the linear system 

(3.10) Xf 9 = Yf , 
where 

^if A vK i at1/2 t -, 

- A /i + /V n,/ l 1 fc+l i A(X K )<iV 1/2 ' 

v h ' — n,h 

with A(M) being the smallest eigenvalue of a matrix M and Ik+i denoting the 
identity matrix in R fe+1 . 

Remark. One can understand the definition of X^ as follows: in the "good" 
case when Xff is non-degenerate in the sense that its smallest eigenvalue is not too 
small, we solve the system (3.7), while in the "bad" case we still have a control on 
the smallest eigenvalue of X^, since we always have A(X^) > N^f£. 

3.2. Bias- VARIANCE equilibrium. A main result on the local polyno- 
mial estimator is the bias-variance decomposition. This is a classical result pre- 
sented many times in different forms: see Cleveland (1979), Goldcnshluger and Ne- 
mirovski (1997), Korostelev and Tsybakov (1993), Spokoiny (1998), Stone (1980), 
Tsybakov (1986, 2003). The version in Spokoiny (1998) is close to the one presented 
here. The differences are mostly related to the fact that the design is random and 
that we consider a modified version of the local polynomial estimator (see Defini- 
tion 4). We introduce the event 

(3.11) ftf = {X u . . . ,X n are such that \{Xg) > N^ 2 and N n , h > 0}. 

Note that on the matrix X^ is invertible. 

Proposition 1 (Bias-variance decomposition). Under Assumption K and if 
f € !Fh{xo,uj), the following inequality holds on the event Q.^: 

(3.12) \f h (x ) - f(x )\ < \-\X«)Vk + lK 00 (uj{h)+o-N-)l 2 \ lh \), 
where jh is, conditionally on X n , centered Gaussian such that E^ M {7^ | £„} < 1. 
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Remark. Inequality (3.12) holds conditionally on the design, on the event 
We will see that this event has a large probability in the regular variation framework. 

3.3. Choice of the bandwidth. Now, like with any linear estimation 
procedure, the problem is: how to choose the bandwidth hi In view of inequality 

(3.12) a natural bandwidth choice is 

(3.13) H n = argmin{u(/i) > Z- — }■ 

Such a bandwidth choice is well known, see, for instance, Guerre (2000). This 
choice stabilizes the procedure, since it is sensitive to the design, which represents 
in the model (1.1) the local amount of information. The estimator is then defined 

by 

fn(xo) = fH n {x ), 

where fh is given by Definition 4 and H n is defined by (3.13). The random band- 
width H n is close in probability to the theoretical deterministic bandwidth h n 
defined by (2.5) in view of the following proposition. 

Proposition 2. Under Assumption M and if w € RV(s) for any s > 0, for any 

< e < 1/2 there exists < n < £ such that 

F™{|^-l|>e}<4ex P (- T ^nF,(W2)), 

where F v {h) ± tfv(t)dt. 

If nF u (h n /2) — > +oo asm +oo (this is the case when v is regularly varying) 
this inequality entails 

H n = (l + o P1 Jl))h n , 

where op(l) stands for a sequence going to in probability under a probability P. 

Proposition 3 motivates the regularly varying design choice. It makes a link 
between the behaviour of the counting process N Ui h (that appears in the variance 
term of (3.12)) and the behaviour of /i close to xq. Actually, the regular variation 
property (see Definition 1) naturally appears under appropriate assumptions on 
the asymptotic behaviour of N n ^. Let us denote by P™ the joint probability of the 
variables (X{). 

Proposition 3. If Assumption M holds with v monotone, then the following 
properties are equivalent: 

(1) v is regularly varying of index [3 > — 1; 

(2) there exist sequences of positive numbers (A„) and (7„) such that lim„ "f n = 0, 
liminf n nX^ 1 > 0, 7„+i ~ 7„ as n — > +oo and a continuous function <j>: M + — ► M + 
such that for any C > 0: 

^{N n ,C Jn } ~ <KC)K as n^+^: 
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(3) there exist (A„), (7„), and <j) as before such that for any C > and e > 0: 

lim >e}=0. 
™^+oo A„ Hl0(C)A„ J 

The proof is delayed until Section 6. Mainly, it is a consequence of the sequence 
characterization of regular variation (see in the Appendix) . 

4. Upper Bounds for fH n (xo) 

4.1. Conditional on the design. When no assumptions on the behavior 
of the design density are made, we can work conditionally on the design. For A > 
we define the event 

E A = {A„ > A}, 

where A„ = X(Xg ). Note that E A e X„. We also define the constant 
m(p) = \/27^ / (1 + tf exp(-i 2 /2) dt. 

Proposition 4. Under Assumption K, if A is such that X 2 N n ^H n > 1 and 
n > k + 1, we have on F>\: 

sup E^ilUxo) - f(x )\ p | £„} < m(p)\-PKUk + l) p/2 K, 

where R n = oj(H n ). 

4.2. When the design is regularly varying. Proposition 5 below 
gives an upper bound for the estimator fH n (xo) when the design density is reg- 
ularly varying. This proposition can be viewed as a deterministic counterpart to 
Proposition 4. 

Let be the smallest eigenvalue of the symmetric and positive matrix with 

entries, for < j,l < k: 

(4.1) (X^t = £±l(i + (-iy+t) £ yi+t+f> K (y)dy. 

Note that in view of Lemma 6 wc have \p,K > 0. 

Proposition 5. Let g > 1 and let h n be defined by (2.5). Let (a„) be a sequence 
of positive numbers such that a n = 0(n<) for some 7 > 0. If fi e 1Z(x 0l [3) with 
[3 > — 1 and ui <G RV(s), we have for any p > 0: 

(4.2) limsup sup E]Jr-r\f n (x )-f(x )\ p }<C\p P K , 

n fes ehniOln (x ,uj) 

where r n = uj(h n ) satisfies 

r n ~ a 2s ^ 1+2s +^n- s ^ 1+2s +^^.A^/n) as n ^ +00, 
with £ Ut „ slowly varying and where C — ^ s / { ~ 1+2s +f } ){k + l) p / 2 m{p)K^ . 
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Remark. Under Holder regularity with radius r we have 

r n ~ (T 2-/(l+2.+« r 0»+l)/(l+2.+/J) n -/(l+2.+/J)^ ij/(1/n) ag n ^ +00 . 

5. Discussion 

5.1. About Assumption M. As stated previously, Assumption M means 
that the design distribution is symmetric around xq close to this point. When it 
is not the case, and if there are two functions v~ g RV(/3 _ ), v + g RV(/3 + ) for 
(3~ , [3 + > —1 and rj~ , r] + > such that for any x g [x — ?7 _ , a; + ?7 + ]: 

//(a;) = ^ + (a; - z )l Xo < x < Xo+77 + + ^~(x - a;)l Xo _ 7? -< x<Xo , 

we can easily prove that the minimax convergence rate is the fastest among the two 
possible ones, which is (2.4) for the choice of j3 — 0~ A /3 + . To prove the upper 
bound we can use the same estimator as in Section 3 with a non-symmetric choice 
of the bandwidth, or more roughly we can "throw away" the observations on the 
side of xq corresponding to the largest index of regular variation (when /x is known) . 

5.2. On Theorem 1 and Propositions 4 and 5. Since we are interested 
in the estimation of / at x , we need only a regularity assumption in some neigh- 
bourhood of this point. Note that the minimax risks are computed over a class 
where the regularity assumption holds in a decreasing interval as n increases. 

It appears that a natural choice of the size of this interval is the theoretical 
bandwidth of estimation h n , since it is the minimum we need for the proof of 
the upper bounds. To state an upper bound with the "design- adaptive" estimator 
fif n {xo) — in the sense that it does not depend on the behavior of the design density 
close to xq (via the parameter /? for instance) — we need a smoothness control in a 
slightly larger neighbourhood size than h n (see the parameter g in Proposition 5). 

More precisely, to prove in Proposition 5 that r n is an upper bound, we use, in 
particular, Proposition 2 with e = g — 1 in order to control the random bandwidth 
H n by h n . Thus, the parameter g is indispensable for the proof of Proposition 5. 
Note that we do not need such a parameter in Theorem 1 since we use the estimator 
with the deterministic bandwidth h n to prove the upper bound part of the theorem. 
Of course, this estimator in unfeasible from a practical point of view since h n heavily 
depends on it, which is hardly known in practice. This is the reason why we state 
Proposition 5, which tells us that the estimator with the data-driven bandwidth 
H n converges with the same rate. 

5.3. On Theorem 2. In the T- variation framework, for the proof of the 
upper bound part of Theorem 2 we use an estimator depending on fi. Again, such 
an estimator is unfeasible from a practical point of view. Anyway, this framework is 
considered only for theoretical purposes, since from a practical point of view nothing 
can be done in this case: there is no observations at the point of estimation. This 
is precisely what Theorem 2 and the corresponding example tell us, in the sense 
that the minimax rate is very slow. 

5.4. About the T-varying design case. For the proof of the upper 
bound part in Theorem 2 we can consider an estimator different from the classical 
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regressogram (see the proof of the theorem). If K is a kernel satisfying Assump- 
tion K, we define 



where h n is defined by (2.5). The point is that since Suppif c [— 1, 1], this estima- 
tor makes a local average of the observations Yi such that Xi € [xq — h — p(h), xo — 
h + p(h)\ U [xo + h — p(h), x + h + p(h)], which does not contain the point of estima- 
tion x for n large enough, since lim h ^ + p(h)/h = (see Appendix). In spite of 
this, we can prove that f n (xo) converges with the rate r n . We can understand this 
as follows: since there is no information at xo, the procedure actually "catches" the 
information "far" from x . This fact shows that again, the T- varying design is an 
extreme case. 

5.5. More technical remarks 

• About Assumption K, the first assumption is used to make the kernel K 
localize the information around the point of estimation xq (see (3.2)). The last one 
is technical and used in the proof of Lemma 6. The two other ones are used for the 
sake of simplicity, since we only really need the kernel to be bounded from above. 

• When {3 — — 1, Theorem 1 holds only for small regularities < s < 1. For 
technical reasons, we were not able to prove the upper bound when s > 1 and 
[3 = — 1. More precisely, in this case we have k = and in view of (3.4) it is clear 
that the local polynomial estimator is a Nadaraya- Watson estimator defined by 



When s > 1, we have to use a local polynomial estimator. The problem is then in 
the asymptotic control of the smallest eigenvalue of (see Lemma 6) and to do 
so we use an average (Abelian) transform property of regularly varying functions, 
which is (see Appendix): 



Thus the only way to have a limit for both cases is to assume K(y) = 0(\y\ v ) for 
some i] > 0, but the obtained upper bound rate in this case would be slower than 
the lower bound. 

6. Proofs 

6.1. Proof of the main results 
Proof of Theorem 1. First we prove the upper bound part of equation (2.4) 
when [3 > — 1. We consider the estimator f n (xo) — fh n (xo), where fh is given 
by Definition 4 with h n given by equation (2.5), and we define r n — w(h n ). Let 
< e < \. We introduce the event 






when a > 0, 
when a = 0. 



B n , e ± {\\(Xfc) - \f,, K \ < e} n { 



2nF v {h n ) 
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Since lim„ nF u (h n ) = +00 (see, for instance, Lemma 4), we have B n , e C for n 
large enough (see (3.11)) and, in particular, on the event B n ^ e the matrix is 
invertiblc. Then using Proposition 1 and since / G Tu n {xo , u>) , we get: 

l/n(so)-/(*o)|lB B , e ^ (Vx-e) _1 \ / fcTT^oo(^(fe„) + -=^===|7/ l j) 

V y/(2-e)nF v (h n ) > 

< (A/j.k - e)-Vfe+ lifoo^(M(l + l7fc„l), 

where we last used the definition of h n . Since, conditionally on X n , jh„ is centered 
Gaussian such that E^ {7^ | X„} < 1, we get for any p > 0: 

sup E^{r-P|/„(x )-.f(xo)r l Bn „ I X„} < (A^K-e)- p (fc + l) p/2 ^™(p), 

/e^ n (xo,o!) 

where m(p) is defined in Section 4. Now we work on the complement B c n £ . We use 
Lemmas 2 and 6 to control the probability of B n , e and we recall that a n — 0(n 7 ) 
for some 7 > 0. When N n ^ n =0we have f n (xo) — by definition and then 

sup E n ftli {r?\f n (xo)-f{x )\ p l B c t } < {a n r-y^{B^ e } = o n {\). 
feu(a n ) 

Then we assume N n j ln > 0. Using Lemma 3 we get: 

sup Ey iM {r-"|/ n (x ) - f(x a )\ p l 6 cj 
feu(a n ) 



2^( v /E« Ai {|7„(x )pp} + <) ^{^J 
2 P Kr- 1 )"( A /nPC (T , fc , 2p + l^P™^} = o„(l), 



and thus we have proved that r n is an upper bound of the minimax risk (2.4) when 
0>-l. 

When (3 = — 1 and < s < 1, we have fc = and the matrix Af^ is 1 x 1 
sized and equal to K rly h n ,o (see equation (6.5)). The bias-variance equation (3.12) 
becomes in this case: 

\fn(X0) - f(x )\ < {Kn^O^K^ihn) + aN-^frhj). 

Consider the event 



r 



< £ 



2nF v (h n ) ~ J I 2nF v {h n ) 



| | K hfi < 1 

J I 2nFJh„) i 



We note that the probability of C„ i£ is controlled by Lemma 2 and equation (6.8) in 
Lemma 5. Then we can proceed as previously to prove that r n is an upper bound 
when (3 = — 1 and we have proved that r n is an upper bound for the left-hand side 
of (2.4). Using Proposition 6 we also have that r n is a lower bound for the left part 
of (2.4). The conclusion follows from Lemma 4. □ 
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Proof of Theorem 2. The proof is similar to that of Theorem 1. For the proof of 
the upper bound part in (2.7) we use the regressogram estimator defined by 

I if N nthn = 0. 

Let < e < 1/2. On the event V. n _ e = | 2 nF (h ) ~ ^ < ej we clearly have 
N nt h n > and since / G ^^(xo,^), we have 

\f n (x ) - f(x )\ < u;(h n ) + o-N-^ 2 \v n \ < u(h n )(l - e)- 1/2 (l + \v n \), 

where v n = — 1 Ym=i ^i^-\Xi-x \<h n 1S , conditionally on X n , standard Gauss- 
ian. Then we get 

sup E^{|/„(z ) - f(x )\n Vn J < <(1 - e)-P/ 2 m(p)- 

Now we work on D£ e . If N n ^ n = 0, we get using Lemma 2 and since a n — (3(r„; 7 ): 

sup E n f J\f n (xo)-f(x )\ p lv^} < «TO^,J 
feu(a n ) 

= 0(r"^)exp ( - y^^ 2 ) = on(l), 
since a n = 0(r~ 7 ). If A^,^ > 0, since |/„(a:o)| < a n + a\v n \, we get 



sup E n f ^{\f n (x ) - f(x )\n v ^} < 2*><(1 + V^ P )\fn{VnJ = On(l), 

where Ccr,o,p is the same as in the proof of Theorem 1. Thus we have proved that r n 
is an upper bound. The lower bound is given by Proposition 6, and the conclusion 
follows from Lemma 4. □ 

In the sequel, (•,•) denotes the Euclidean scalar product on R fe+1 , ei = 
(1,0, ... ,0) e M fe+1 , || • ||oo stands for the sup norm in K fe+1 , and || • || stands 
for the Euclidean norm in 

Proof of Proposition 1. On Qff we have in view of Definition 4 that Xf = Xf 
and X^" is invertible. Let < e < 1/2 and n > 1. We can find a polynomial P^' £ 
of order k such that 

sup \f(x) - P]' s {x)\ < inf sup \f{x) - P(x -x )\ + -J=- 

\x-x \<h PeV k \ x -x \<h V n 

In particular, with h = we get \f(x ) - Pf' e (x )\ < -^=. Defining Oh G M fe+1 such 
that P^ £ = P 0h (see (3.6)) we get 

I AM - f(x )\ < -^= + \{e h - e h , ei )| = 4= + l((xf )- J x^(?, - o h ), ei >|. 
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Then we have for j e {0, . . . , k} by (3.4) and (1.1): 

(Xf (0 h - 9 h ))j = (f h - P; ,£ , h,h)h,K = {Y- J? ,e , 4>j, h )KK 

= if - Pf 1 ' 6 , 4>j,h)h,K + (Y- f, <j>j,h)h,K 

= (f - Pf' E , 4> 3 .h)h.K + (£, <f>j,h)h,K = B h .j + Vh,j, 

thus (Oh — Oh) = Bh + Vh- In view of Assumption K and since / € !Fh(xQ, w), 
we have: 

= i</ - p; i£ , i < ii/ - ii^fciu,* < N^hKnfah) + -^=), 

thus Halloo < N^hK^^h) + j-). Moreover, since A" 1 ^) < A^ 2 < n 1 ^ on 
Qh,K, we have: 

l((xf )- 1 s,, ei )| < ||(xf )- 1 || 115,11 < ||(xf mVfcTIII^IIco 

< A- x (A-f )VfcTTXoo^(/i) + VfcTTifooe, 

where we last used the fact that ||M _1 || = \~ 1 (M) for a positive symmetric ma- 
trix. The variance term Vh is clearly, conditionally on X n , a centered Gaussian 
vector, and its covariance matrix is equal to cr 2 X^ . Thus the random variable 
((X.ff )~ 1 Vh, ei) h.K is, conditionally on X„, centered Gaussian of variance: 

vl = a 2 ( ei ,(Xf )-'xf (X* )- 1 e 1 ) < a 2 {e u (X* )^Xf (X* )- 1 e 1 ) 
= a 2 ( ei , (X* )- 1 e 1 ) < ^IKXf )- 1 || = ^N~XX-\X^), 

since AT < 1. Then A(A^) = w£\ M=1 (x,Xg x) < \\Xg ei|| < V^TT, since is 
symmetric and its entries are smaller than 1 in absolute value. Thus 

vl < o^N-fr-HX?) < <? 2 N-l(k + l)\-\X^ ), 

and the proposition follows. □ 

Proof of Proposition 2. The proposition is a direct consequence of Lemmas 1 
and 2. □ 

Proof of Proposition 3. (2) (1): In view of Assumption M one has for n large 
enough 

/•C*7„ 

K{ N n,C ln } = 2n / i/(a;) dx = 2nF„(Cj n ), 
Jo 

thus (2) entails 2nX n 1 F v (C"f n ) ~ 0(C) as n — » +oo and then F v e RV(a) in view 
of the characterization (A. 8) of regular variation. Since F u (0) = 0, we have more 
precisely F v e RV(a) for a > and since v is monotone, we have v e RV(a — 1) 
(see Appendix). 
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(3) (2): Let e > 0. We define the event 

^■ e > = (lit; -'I s 4 

Then: 

A^E^A^J = A-^^jiV^^Jl^^) + l^c (C;£) )} 

and then limsup„ A^ 1 E^{iV„^c 7?i } < (1 + e)cp(C). On the other hand, 

X-'E^N^cjJ > A~ *E™ {-/Vn,c 7n 1 a„ (c,e) } > (1 - e)0(C)P^{A„(C, £ )}, 

and then liminf„ A^E^A^c^J > (1 - e)0(C). 

(1) => (3): Let v E RV(/3)'and < s < 1/2. If /3 > -1, we have F v e RV(/3+l) 
(see in the Appendix), thus we can write F v {h) — hP +1 £p(h), where If is slowly 
varying. We define 7„ = n _1 /( 2 ^ +1 ^ when /3 > —1 and j n = nT 1 if (3 = —1. When 
(3 = —1, we have F v e RV(0) (see Appendix). We note that in both cases we have 
lim„ 7„ = and 7„+i <~ 7„ as n — > +oo. In view of Lemma 2 we get for n large 
enough 

where we used the fact that tp is slowly varying and where we defined A„ = 
2nF u (j n ) and </>(C) = C^ +1 . Then we clearly have lin^nA^ 1 = +oo and the 
proposition follows. □ 

6.2. Proof of the upper bounds for /h„(x ) 

Proof of Proposition 4. Since Ea C &-H n , (3-13) and Proposition 1 entail that 
uniformly in / G Tu n (x , u) we have 

\fn{x ) - f(x )\ < A- 1 Vfc+Tifooi?„(l + \lH n \), 

where ~fH n is, conditionally on X„, centered Gaussian such that Ey ^{7^ | X„} < 
1. The result follows by integration with respect to (■ | X n ). □ 

Proof of Proposition 5. Let us define e = g — I. We can assume without loss of 
generality that e < | A X/s.k ■ We consider the event A n . £ from Lemma 6. In view 
of this lemma we have A n ,e C E\ K _ E n {(1 — e)/i„ < < (1 + e)h n } and then 
^eh n {xo,^) C JF ffii (x ,w). Thus using Proposition 4 we get 

sup E^{|/„(z ) - /(so)| p U,.« I X„} 

/e^"eh n (a!Oiw) 

< m(p)(\p, K - e)-VKUk + l) p/2 i?£ 

< m(p){\ (j . K - e)-VKUk + l) p/2 (l + e) p(s+1) <, 



16 



S. Gaiffas 



where we used equation (6.1) in the same way as in the proof of Lemma 1 to obtain 
on A n ,e that u)(H n ) < (1 + e) s+1 uj(h n ). On the complementary event A^ e , using 
inequality (6.11) and Lemma 3 and since a n = 0(n 7 ) for some 7 > 0, we get 

sup E^{r-P|/„(x )-/(x )ri^J 



< 2P(a n r- 1 n^nPC (TA2p + 1) J - o n (l), 

and (4.2) follows. The equivalent of r n is given by Lemma 4. □ 

6.3. Lemmas for the proof of the upper bounds 

Lemma 1. If ui e RV(s) for any s > 0, then for any < e < \ there exists 
< rj < e such that 



N, 



n,(l — e)h n 



\2nF v {{\ - e)h n ) 



< 



N, 



n,(l+e)h„ 



2nF v {{\ + e)h n ) 



< 



Proof. In view of (3.13) we have {H n < (1 +e)h n } = {N nA1+£)hn > a 2 uj- 2 ((l + 
e)h n )}- Define e\ = 1 — (1 — £ 2 )~ 2 (1 + e) _2s . For e small enough, it is clear that 
£1 > 0. We recall that t u stands for the slowly varying term of u> (see Definition 2). 
Since (A.l) holds uniformly on each compact set in (0, +00), we have for n large 
enough that for any y £ [5, §]: 

(6.1) (1 - e 2 )£Uh n ) < UyK) < (1 + s 2 )UK), 

so using (6.1) with y = 1 + e (e < |), we obtain in view of (2.5): 

2(1 - £l )nF u ((l + e)h n ) > (1 - e 2 )- 2 (l + e)- 2 V^- 2 (^„) 

= o- 2 ((l + e)h n )- 2s (l-s 2 )- 2 ?- 2 (h n ) 
><j 2 uj((l+s)h n )- 2 , 

and then 

{JVn,(i+ e )fc B > 2(1 - ei)fiF„((l + e)/i„)} C {i/„ < (1 + e)h n }. 
Using again (6.1) with y = 1 — e we get in the same way 

{N n ,{i-e)h n < 2(1 + ei)nF v ((l - e)h n )} c {H n > (1 - £ )/i„}, 

and then 



N, 



n,(l— e)hn 



2nF v {{\ - e)h n ) 



< £ 



H 



N., 



n,(l+s)h„ 



2nF v {{\ + e)h n ) 



< £ 



Now the result follows for the choice 77 = e A £1. □ 
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Lemma 2. Under Assumption M, we have for any e, h > 0: 



J") 



2nF v {h) 



- 1 



>e}<2cx P (- T ^nF,(/ l) ). 



Proof. It suffices to apply the Bernstein inequality to the sum of independent 
random variables Zi = l\Xt-x \<h ~ PjHI-^i — x o\ < h} f° r i = 1, . . . ,n. □ 

Lemma 3. For any p > and h > i/ie estimator ft {see Definition 4) satisfies 

sup Ey„{|A(a?o)r I *«} < C ff , fe , p (a^)f, 
/ew(a) 

w/iere C ff , fe , p = (k + l)?/ 2 ^/^ / r+ (1 + <t*)p exp(-i 2 /2) dfc. 

Proof. When iV ni /, = 0, we have fh = by definition and the result is obvious, so 
we assume N n ^ > 0. Using the fact that \(A+B) > \(A)+\(B) when A and B are 

symmetric non-negative matrices we get A(Xj^) > N^f£ > 0, thus is invertible. 

Equation (3.10) entails \f h (x Q )\ = | ((Xf ^Xf h , e\) \ = |((Xf )" 1 Y ft , ei)|. In 
view of (1.1) we can decompose for j S {0, . . . , £;}: 

(Y/,)j = (Y, <j>j,h)h,K = (.f, <t>j,h)h,K + (£, <Pj,h)h,K = B h .j + V h ,j. 

Since / e t/(a), we have under Assumption K that 1-8/^1 < aN n> h, thus ||-B/i||oo < 
aN n ^h- As in the proof of Proposition 1 we have that ((Xj^) -1 ^, e\) is, condition- 
ally on £„, centered Gaussian with variance 

v 2 =a 2 (e 1 ,(X^xf (X^^eO 

<a 2 ( ei ,(Xf)- 1 Xf(Xf)- 1 e 1 )< ( T 2 ||(Xf)- 1 |H|Xf||. 

Assumption K entails that all the elements of the matrix X^ are smaller than N n> h, 
thus ||Xf || < (fc + l)JV n , h . Since Xf is symmetric, we get || (X^)- 1 1| = A _1 (Xjf) < 
N~)l 2 , and then < er 2 (fc + 1). Finally, we have 

\fh(xo)\ < KCX^)- 1 ^,^)! + |<(Xf )- 1 Vh,ei>| 

< IKXf)- 1 !! ||Bfc|| + o-v^+T|-Yfc| < Vk + T{aVZ + *\>y h \), 

where 7^ is, conditionally on £„, centered Gaussian with variance smaller than 1. 
The result follows by integrating with respect to (• | X n ). □ 

Lemma 4. If v £ RV(/3), u € RV(s) for s > and ffte sequence (h n ) is defined 
by (2.5) £/ierc i/ie rafe r n = uj(h n ) satisfies 



(6.2) r n ^c s ^a 2s ^ 1+2s+ ^n- s ^ 1+2s+ ^e^(l/n) as n - +00, 
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where £ bJ>v is slowly varying and c s ^ = 4»/( 1 + 2 «+0). When u>{h) = rh s (Holder 
regularity) for r > 0, we have more precisely: 

(6.3) r n ~ c a ^a 2a /(i+2-+« r (/»+i)/(i+2.+/J) n -«/(i+2«+«^ ji/(1/n) as „ ^ +QOj 
wftere ^ Sil/ is slowly varying. It is noteworthy that when [3 = — 1 tfte resuZt becomes: 

r n ~ 2crn _1 / 2 f a , i ^(l/n) as n — ► +oo. 
W7ien i/ e rV(p), we ftave 

(6.4) r n ~C, v (l/n), 

where £ UfV is slowly varying. 

Proof. Denote F v {h) = f Q l v{t) dt and let G(h) = uj 2 (h)F v (h). When j3 > -1, we 
have F„ G RV(/3 + 1) (see the Appendix) and when /3 = — 1, F v is slowly varying. 
Thus G £ RV(1 + 2s + (3) for any /3 > — 1. The function G is continuous and 
such that lim/,_ >0 + G(h) = in view of (A. 2), since 1 + 2s + [3 > 0. Then, for n 
large enough, h n = G^ (a 2 / (An)) , where G^(ft) = inf{y > | G(y) > h} is the 
generalized inverse of G. Then in view of (A. 8) we have G^~ £ RV(l/(l+2s+/3)) and 
then u)oG^~ £ RV(s/(l + 2s + /3)) (see Appendix). Thus we can write woG*~(h) = 
I l s/(i+2s+f3)g^ ^(h), where £ u<v is a slowly varying function. Thus: 

r„=a,(G^(^))= C .,^/(^n--/(^^(^) 
~ c s ,0O- 2s ^ 1+2s+ Vn- s /( 1+2s +Vl u , v (l/n) as n - +oo, 

since € is slowly varying. When w(ft) = rft s , we can write more precisely h n = 
G^(a 2 /(Ar 2 n)), where G(ft) = h 2s F v (h), so (6.2) and (6.3) follow. 

Let y £ M. Using (A. 9) and the uniformity in (A.l) we get lim/ l _ >0 + (ft + 
yp{ti))/Zw{h) = 1, thus lim/ l _ +0 + a; (ft + yp(h))/u(h) = 1. Moreover, since rV(p) is 
stable under integration (see Appendix) we have F„ € TV(p), thus lim ft ^ + G(ft + 
yp(y))/G(h) = exp(y) and then G € TV(p). For n large enough, h n is well defined 
and given by h n = G^(a 2 /(4n)). Since G^ £ UY(£) for £ = p o g RV(0) 
(see Appendix), G^~ belongs, in particular, to RV(0) in view of (A. 11) and then 
r n = LooG^(a 2 / (4n)), where luoG*~ £ RV(0). Thus r n ~ cjoG^(n _1 ) as n — > +oo 
and (6.4) follows with = w o G*~ . □ 

Study of the terms A(A',f r ) and X(X^). We recall that the matrix 
Xh,K is defined as the symmetric and non-negative matrix with entries (Xh,K)j,i — 
K n ,h,j+l for < j, I < k, where: 

^itU^)H^)- 

' i=l 

for a £ N. Define K n ^ a = N n , h K n ^ a and 

(6.6) K a , p 4 (1 + (-!)«) /" y<*+^(y) dy. 

Jo 
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We define for any e > the event 



nF v (h) 

Lemma 5. Let a G N and e > 0. Under Assumption K anii if (J* € TZ(xq,/3) 
with (3 > — 1, i/ien /or any positive sequence (7„) going to we /icwe /or n large 
enough 



(6.7) P; l {D^ !Q ^J <2exp( 

Mien [3 = —1 we have: 



(6. 



n-F„(7„) 



2if 



8(2 + e/3) 
(0) > e| < 2exp( 



nR 



V(7n))- 



8(2 + e/3) 



l(7n))- 



Proof. First we prove (6.7). We define Q;,„, a 4 , 4 

Qi, n , Q - E"{Q iinjQ }. Since /x e U(x ,f3), one has for i = 1, . . . , n: 

where we used Assumption K and the fact that [xo — 7n, xo + j n ] C W for n large 
enough. Then equations (A. 3) and (A. 4) entail: 



lim — -E" 

n nF u {j n ) 



{Qi,n,a} = (P + l)K at i3, 



and for n large enough: 

(6-9) D^^cll^^g^,^^}. 
In view of Assumption K we have E"{Z^ nja } = 0, |^,n,a| < 2, and 

n 

ft, 2 , 4 ^E^{Z 2 Il!Q } < nE£{Q 2 ^J < 2nF„( 7 „). 

i=l 

Since the Z^ n , a are independent, we can apply Bernstein's inequality. If r„ = 
^nF u {^ n ), equation (6.9) and Bernstein's inequality entail: 

P£{D£, 7n , a ,^}<2exp 



+ 2r„/3) 



< 2exp - 



8(2 + e/3) 



-rnF v (-f n ) ), 



thus (6.7) follows. The proof of equation (6.8) is similar. When (3 = — 1, we have 
z/(i) = t~ x l v (t). Define Z,- )Tl = Qi, n ,o — E^ {<3i,n,o}- In view of equation (A. 5) we 
have 



lim _L^E"{Q } = f K 



(t/h)t v {t)dt/t = 2K(0) > 0. 
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Then for n large enough one has 



K 1 " 



nF„{ ln ) 



The Zi_ n are independent and centered and \Zi^ n \ < 2. Moreover, in view of As- 
sumption K we have as before b 2 n = Y^i=i^p,{^i,n} — ^ n Fv{ln) and using again 
the Bernstein inequality we get (6.8). □ 

Lemma 6. Let Assumption K hold. Assume that u <G RV(s) with s > 0, 
H G 7Z(xq, f3) with [3 > —1, and Xp.K is defined by equation (4.1). We have \p.K > 
and for any < e < \ we can find an event A n ,e such that for n large enough 



(6.10) A n , £ C {|A(Afj - X PtK \ <e}n {|A(A#J - Xp, K \ <e}n{ 
and 



(6.11) 



\{AZ e }<4(k + 2)exp(-c , 



where Cj3^,e > 0. 

Proof. Since \p,K is the smallest eigenvalue of X^ , we have \p,K > 0, otherwise 

V /3 ' 



defining p(y) = (1, y, . . . , y k ) and since X¥ is symmetric, we should have 



(*x p(y)) y K(y)dy, 

-i 

where x 7^ is the normalized eigenvector associated to the eigenvalue \(i.K and 
where we used the fact that 



(6.12) 



A(M) = inf (x,Mx), 

||x||=l 



for any symmetric matrix M . Then Vy G Supp K we have *xop(y) = 0, which leads 
to a contradiction since y 1— » 'xop(y) is a polynomial. For any h, £ > we introduce 
the events: 



(6.13) 



A n , h , e = {\\(X*) - \p iK \ <e}, 
+ 1 



{\ Kn - 



h.a 



-K, 



a, 



Bn./i,a,e 

Using the characterization (6.12) we can easily prove that 

2k 



<-*}■ 



(6.14) 



P| ^n,h,a,s/(k+l) 2 C ^n,h,e- 



a=0 
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Since 



Xi - x \ a ( T ,(Xi - x \ T ,/Xi-x 



1 ^( x% ~ x ° Y(k 



K 



•-))■ 



we have when K is the rectangular kernel K R , 



N n ,H n /H n \ a 
N n ,h n V K 



\K n ,H n ,a ~ K n Ji n ,a\ < 

and otherwise under Assumption K 

\K n ,H n ,a~K n ,h n ,a\ < 



— ( — — — V 1 ] 

2\h n J 



N n . 



N n ,h n ^ h n 

Let us introduce for e > the event 



N n .h„ \h n J 







K 


K 1 






+P 





Then for a good choice of £i < e we have |if n ,/r„,a — K ni h n , a \ < 2 (k+iy 1 011 tne even t 
C n ,ei nF„ j£l and since K < 1, we have if Q ,/3 < and noting that D ni /i,o,if K ,£i = 

{ 1 2^'(/t) — 1 1 — £i } ' we nave f° r an y a e ^ 

D n .h,0.K R . ^ — H D n fcajf e C B n ft 

' ' ' ' 3(fc+l) a +e '3(fc+l) 2 + s 



3(fc+l) 

A 2s 



Using (6.14) wc get for rj = 3{k+1 e )2+2e 

2k 

(6-15) ®n,h n ,0,K R ,r) n P) Dnj^^^^ C A n> /, nie . 



a=0 



Wc take < £2 < £1 such that ^pr^ — < 1 + £ i (f° r £ i small enough). Since 
h 1 ^ iVn,/, is increasing we have 

Cn, £! C {A r „,(l- £2 )ft„ < An,//„ < AT n) ( 1+e2 ) hn }, 

and in view of Lemma 1 we can take < £3 < £ 2 such that 

D ri ,(i-£ 2 )/j n ,o,i ; £ _ «,£3 H D„ i (i +£2 )/ (n) o,ifH,£3 C C„ i£2 . 

Using (A.l) with the slowly varying function = F u (h)h~(P +1 \ we have for n 

large enough that uniformly in y e [i |] 



(6.16) 



(1 - ei)MM < Mi/&») < (l + £iKf(/i„), 
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in particular, for y = 1 — e\ and y = 1 + E\ we get by the definition of £2 and since 

£3 < £2 < £1: 

^n,(l-e 2 )hnfi,K E ! ,e 3 n ^n,(l+£!)fi ni 0,K a ,£3 n j-E jE3 C F„ jSi . 

Then we define for £4 = £ 3 A 3 ( fc+ i)j +e the event 

2fc 

Ai,£ — ^n,(l-e 2 )h„,0,K R ,E4 ^ D n ( 1+£2 ) hri0 j^f?^ fl ^> n ,h n ,0,K R ,e 4 ^ ^ > nJi„,a,K,e 4 7 

which satisfies (6.10) in view of the previous embeddings. Using inequality (6.7) in 
Lemma 5 and since £4 < £2 < £1 < 5, we get 

. o-(/3+3) 2 

WU<^ + 2)ex P (- w -^^), 

where we used (6.16) and (2.5). □ 

6.4. Proof of the lower bounds 

Lemma 7. If there are two elements f and f\ of a class S such that the 
Kullback-Leibler distance between the corresponding probabilities ¥q and Pi satisfies 
/C(Po,Pi) < Q < +00 with \fo(xo) ~ fi( x o)\ > 2cr„ for some constant c > 0, £/ien 
ifte pointwise minimax risk 7J n (E, /u) over tfte cZass E defined by (2.1) in f/ie model 
(1.1) satisfies: 

?£„(£,/«) > C(C, Q,p)r n , 

where C(c,Q,p) 4 ^( e -« V Iz^Hf'^. 

This result is classical. It can be found in Tsybakov (2003) with a proof based 
on a reduction scheme with two hypotheses and inequalities between the Kullback- 
Leibler distance and other probability distances. 

Proposition 6. Let h n be defined by (2.5), let (a n ) be a sequence of positive 
numbers going to +00 and r n = u)(h n ). If S = S^ n>an (xq, u>) is the class given by 
Definition 2, we have 

(6.17) liminfr- 1 ^ (£,//) > C a , p . 



Proof. We use Lemma 7. All we have to do is to find two functions fo, n and f\ t 
such that: 

(1) there is some < Q < +00 such that /C(Pft,P?) < Q; 

(2) hnjl (zo,w); 

(3) \fo,n(xa) — fi, n (xo)\ > 2cr„ for some constant c > 0. 
We choose the two following hypotheses: 

fo, n ( x ) = u{h n )\\ x _ Xo \< hn , fi, n (x) = w(\x - x \)l\ x - Xo \< hn - 
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(1) Since the are centered Gaussian of variance a 2 and independent of X n , we 
have: 

1 " 

AC(PS,P? | X„) = _^(/ 0! „(X 4 ) - h, n (Xi)) , 

i=l 

then in view of (2.5) 

/C(P£,P?) = JL||/ , n -/ lin ||| 2W < ^ {hn)Fu{hn) = I 

(2) For ft, G [0, /i„], taking P as the constant polynomial equal to u>(h n ), we have 
that the continuity modulus of f n is 0, and taking P = we obtain that the 
continuity modulus of /i.„ is bounded by u(h). Moreover, for n large enough, we 
clearly have / ,„, fi,„ G W(a„) since a„ — > +oo. 

(3) If we take c = 1/2, we have \fi, n (x ) - f , n (xo)\ = w(ft n ) = 2cr„. □ 

6.5. Computations of the examples. For a given design density, we 
compute the minimax convergence rate r n by first giving an equivalent as n — > +00 
of the smallest solution h n of 

= — , = , 

and then an equivalent of r n = uo{h n ). 

6.5.1. Regularly varying design example. In the regularly varying design case we 
find the equivalent of h n using the following proposition. 

Proposition 7. Let 7 > and aeR. If G(h) = /i 7 (log(l/ft)) Q , then we have: 
G^(ft) - 7 a/7 /i 1/7 (log(l//i))- Q/7 as h^Q+. 



Proof. When a = 0, the result is obvious, hence assume a G M \ {0}. We look 
for /i such that ft 7 (log(l//i)) a = x, when x > is small. If a > 0, we define 
t = log(/i 7 /"), so this equation becomes 

(6.18) texp(i) = -jx 1/a /a, 

where t < 0. The equation (6.18) has two solutions for a; small enough, but 
they cannot be written in an explicit way. Then let us consider the Lambert 
function W defined as the function satisfying W(z)e w ^ = z for any z G C. 
See, for instance, Corless et al. (1996) about this function. We are only inter- 
ested here in its real branches. This function has two branches Wq and W-\ in 
R. We denote by Wo the one such that Wo(0) = and W-\ the one such that 
lining- W-i(h) = —00. The two solutions of (6.18) are then i = W-i(— ^x 1 ^ /a) 
and ti — Wo(— 7x 1 / Q /a) and ho = exp(aT / F_i(— 7x 1 / Q /a)/7) is the smallest so- 
lution. By definition of W we have for — 1/e < x < and a G R: e^- 1 ^ — 
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(— x) a (— W-i(x))~ a , and since W_i satisfies W-\{— x) <~ log(x) as x — > + , we have 
/i = (7x 1 / Q /a) Q / 7 (-^_i(-7x 1 / Q /a))- a /T ~ 7 Q /'>'x 1 / Q (log(l/x))- Q /'>' as x -» 0+. 

When a < 0, we proceed similarly. We have t > and (6.18) has a single solution 
t = W{){-^x x l a I a), thus /i = cxp -aM / o(— 7x 1 / Q /a)/7). By the definition of W 
we have Vx > and a e M: e aW o(x) = a; a W "°(x), and since TU satisfies W (x) ~ 
log(x) as x — > +oo, we find again h <~ 7 Q / 7 x 1 / a (log(l/x))~"/ 7 asu + . □ 

For the second example of regularly varying design, using Proposition 7, we find 
that an equivalent to the sequence h n defined by (2.5) is 

(1 + 2s + / ?)("+27)/(i+2.+fl ^) 2/(1+2S+/3) (n(logn) Q + 27 )- 1 /(i+2^) j 

and since ui(h) = rh s : (log(l / 'h)) 1 , we find that an equivalent of r n (up to a constant 
depending on s, (3, 7, a) is 

cr 2 s /(l+2 s +/3) r (/3+l)/(l+2 s+ /3) (- n( - log ^ a - 7 (l +/ 3)/ s - ) - s /(l + 2 s + / 3) _ 

The computation for the third example ((3 = —1) is similar to the second example, 
since F v {h) = {\og{l/h)) 1 - a . 

6.5.2. T-varying design example. For the T-varying design example f(h) = 
exp(— l/h a ), we first use the fact that when v € TV(p), we have F v (h) <~ p(h)v(h) 
as h — > + (see Appendix). Recalling that p(ft) = , we solve 

(6.19) /i 1+2s+Q exp(-l/^) = y n , 

where j/„ = a 2 a/(r 2 n). 

Defining t = h~ a , equation (6.19) becomes t-( 1 + 2s + a )/ a cxp(— t) = y n , which 
we rewrite as xexp(x) = a/ (I + 2s + a)y n a ^ 1+2s+a ^ for x = a/(l + 2s + a)t. 
Then we have x = Wo(a/(l + 2s + a)yn a /( 1+2s+0! )) ; where Wq is defined in the 
proof of Proposition 7. Using the fact that Wo(x) ~ log(x) as x — > +00, we get 
x ~ i+2°+a l°g n as n ^ +00, thus /i„ <~ (logn) -1 /" and the result holds since 
r n = rh s n . 

Appendix A. Some Facts on Regular and T- Variation 

We recall here some results about regularly and T-varying functions. The re- 
sults stated in this section can be found in Bingham et al. (1989), Geluk and dc 
Haan (1987), and Sonata (1976). 

A.l. Regular variation. Let I be a slowly varying function throughout 
the following. An important result is that the property 

(A.l) lim £(yh) 11(h) = 1, 

fc->0+ 

holds uniformly for y in any compact set in (0, +00). Now if R\ £ RV(ai) and 
i?2 € RV(a2), one has 

(1) Ri x R 2 e RV( ai +a 2 ), 

(2) R l0 R 2 e RV(ai x a 2 ). 
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If R e RV(7) for 7 e M \ {0}, then as ft -> 0+ we have 

L +oo if 7 < 0. 

The asymptotic behaviour of integrals of regularly varying functions, usually called 
Abelian theorems, plays a key role in the proofs. 

• If 7 > —1 we have 

,A 

(A.3) / Pl{t)dt~ (l + 7)- 1 ft 1+7 £(ft) as ft^0+, 

Jo 

and, in particular, h J q t^i^dt £ RV(7 + 1). This result is known as the 
Karamata theorem. 

• When 7 = -1 and if f£ £(t)f < +oo for some r] > 0, then h ^ jj l(t)f £ 
RV(0) and we have 

1 f h „,,dt 



lim — — / £(t) — = +oo. 

• If R is some positive monotone function such that ft i— » R(t) dt belongs to 
RV(7) for some 7 > 0, then R £ RV(7 - 1). 

• If K is a function such that J Q t~ s K{t) dt < +00 for some 5 > 0, then 

(A.4) f K(t)£(th)dt~£(h) f K(t)dt as ft^0+. 

Jo Jo 

Moreover, when f£ £(t)dt/t < +00 for some r\ > 0, and K is such that Vi > 0, 
\K(t) - K(Q)\ < p\t\ K for some p > and k > 0, one has 



(A.5) f K(t/h)£(t)dt/t~K(0) f £(t)dt/t as h -» 0+. 

If i? is defined and bounded on [0, +00), one can define the generalized inverse 



as 

(A.6) R~{y) = inf{ft > such that R(h) > y}. 

If R £ RV(7) for some 7 > 0, then there exists R~ £ RV(l/7) such that 

(A.7) R(R~(h)) <~ R~(R(h)) <~ ft as ft -» 0+, 

and i? _ is unique up to an asymptotic equivalence. Moreover, one version of R~ 
is R<~. 

If {5 n )n>o and (A„)„>o are sequences of positive numbers such that £^+1 ~ <5„ 
as n —* +00, lim„ <5 n = 0, and if there is a positive and continuous function <f> such 
that for any y > 

(A.8) limA„i?(y<5„) = 0(y), 

then varies regularly. 
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A. 2. T- variation. We describe now the properties of T- varying functions 
and Il-varying functions. The results are due to de Haan. The references are the 
same as for regular variation. All the following results can be found therein. 

The first result states that if v is a function such that (2.6) holds for all y e R, 
then (2.6) holds uniformly on each compact set in K. If p is such that (2.6) holds, 
then 

(A.9) lim p(h)/h = 0. 

The auxiliary function p in definition (2.6) is unique up to within an asymptotic 
equivalence and can be taken as h i— > J Q v{t)dt/ v{h). 

The class TV(p) is closed under integration. If v G TV(p), then F v {h) = 
J Q v(t) dt e TV(p) and we have 

F v {h) ~ p{K)v{h) as h^0 + . 

We have seen that the class of regularly varying functions RV is closed under 
the operation of functional inversion. In the case of T- variation, the inversion maps 
the class TV in another class of functions, namely the de Haan class IIV. 

Definition 5 (II- Variation) . A function v is in the de Haan class HV if there 
exists a slowly varying function £ and a positive real number c such that 

(A.10) Vy>0, lim (u(yh) - v{h))/i{y) = clog(y). 

h— >0+ 

The class of functions v satisfying (A. 10) is denoted by UV(£). 

• If v e TV(p), then £ = po is slowly varying and v*~ £ ITV(£). 

• If v e ITV(/) for some I e RV(0), then e TV(p) with p = lov^. 

In both senses the inverses and their auxiliary functions are asymptotically 
unique. The following inclusion tells us that n-variation can be viewed as a re- 
finement of slow variation. Actually, any n-varying function is slowly varying: for 
any £ € RV(0) we have 

(A.ll) nv(£) c RV(0). 
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